Goto

Collaborating Authors

 data manipulation






Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

arXiv.org Artificial Intelligence

Graph neural network (GNN) has captured wide attention due to its capability of graph representation learning for graph-structured data. However, the distributed data silos limit the performance of GNN. Vertical federated learning (VFL), an emerging technique to process distributed data, successfully makes GNN possible to handle the distributed graph-structured data. Despite the prosperous development of vertical federated graph learning (VFGL), the robustness of VFGL against the adversarial attack has not been explored yet. Although numerous adversarial attacks against centralized GNNs are proposed, their attack performance is challenged in the VFGL scenario. To the best of our knowledge, this is the first work to explore the adversarial attack against VFGL. A query-efficient hybrid adversarial attack framework is proposed to significantly improve the centralized adversarial attacks against VFGL, denoted as NA2, short for Neuron-based Adversarial Attack. Specifically, a malicious client manipulates its local training data to improve its contribution in a stealthy fashion. Then a shadow model is established based on the manipulated data to simulate the behavior of the server model in VFGL. As a result, the shadow model can improve the attack success rate of various centralized attacks with a few queries. Extensive experiments on five real-world benchmarks demonstrate that NA2 improves the performance of the centralized adversarial attacks against VFGL, achieving state-of-the-art performance even under potential adaptive defense where the defender knows the attack method. Additionally, we provide interpretable experiments of the effectiveness of NA2 via sensitive neurons identification and visualization of t-SNE.


Truthful Dataset Valuation by Pointwise Mutual Information

arXiv.org Artificial Intelligence

A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to potentially small test sets. We propose a new data valuation method that provably guarantees the following: data providers always maximize their expected score by truthfully reporting their observed data. Any manipulation of the data, including but not limited to data duplication, adding random data, data removal, or re-weighting data from different groups, cannot increase their expected score. Our method, following the paradigm of proper scoring rules, measures the pointwise mutual information (PMI) of the test dataset and the evaluated dataset. However, computing the PMI of two datasets is challenging. We introduce a novel PMI measuring method that greatly improves tractability within Bayesian machine learning contexts. This is accomplished through a new characterization of PMI that relies solely on the posterior probabilities of the model parameter at an arbitrarily selected value. Finally, we support our theoretical results with simulations and further test the effectiveness of our data valuation method in identifying the top datasets among multiple data providers. Interestingly, our method outperforms the standard approach of selecting datasets based on the trained model's test performance, suggesting that our truthful valuation score can also be more robust to overfitting.


Enhancement attacks in biomedical machine learning

arXiv.org Artificial Intelligence

The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works have investigated the ability of adversarial attacks to degrade model performance in medical imaging, the ability to falsely improve performance via recently-developed "enhancement attacks" may be a greater threat to biomedical machine learning. In the spirit of developing attacks to better understand trustworthiness, we developed two techniques to drastically enhance prediction performance of classifiers with minimal changes to features: 1) general enhancement of prediction performance, and 2) enhancement of a particular method over another. Our enhancement framework falsely improved classifiers' accuracy from 50% to almost 100% while maintaining high feature similarities between original and enhanced data (Pearson's r's>0.99). Similarly, the method-specific enhancement framework was effective in falsely improving the performance of one method over another. For example, a simple neural network outperformed logistic regression by 17% on our enhanced dataset, although no performance differences were present in the original dataset. Crucially, the original and enhanced data were still similar (r=0.99). Our results demonstrate the feasibility of minor data manipulations to achieve any desired prediction performance, which presents an interesting ethical challenge for the future of biomedical machine learning. These findings emphasize the need for more robust data provenance tracking and other precautionary measures to ensure the integrity of biomedical machine learning research.


Roadmap To getting into Data Science.

#artificialintelligence

Getting started with data science can be a confusing journey, especially if the person is not from the STEM field. In this article, I explore and define the essential aspects of data science you need to get started correctly. This article will mainly tackle the technical skills required for a data scientist. To become a data scientist, you need to be familiar with programming, statistics, and machine learning. This article will outline the steps you can take to become a data scientist and the important libraries you need to know.


Top five Essential Skills to Master in Artificial Intelligence

#artificialintelligence

Artificial intelligence (AI) is rapidly transforming industries around the world, from health care to finance to retail. As AI becomes more prevalent, the demand for professionals with AI skills is just expected to grow. In this blog post, we are going to highlight five essential skills that every AI professional should master in order to succeed in this rapidly evolving field. From machine learning and deep learning to data manipulation and problem-solving, these skills will give you the foundation you need to build and work with AI systems. Thus let's dive in and explore these essential AI skills in additional detail.


Python for Data Science: A Look at the Top Libraries

#artificialintelligence

Python is a popular language for data science due to its powerful libraries and tools for data manipulation, visualization, machine learning, and statistical analysis. In this listicle, we will introduce some of the top Python libraries for data science and provide a quick and cool way to get started with them. NumPy is a library for working with large, multi-dimensional arrays and matrices of numerical data. It provides functions for performing mathematical operations on arrays, such as linear algebra, statistical analysis, and random number generation. It provides functions for reading in data from various sources, cleaning and wrangling data, and performing aggregations and transformations. Matplotlib is a library for creating static, animated, and interactive visualizations in Python.